A CONVERGENCE ANALYSIS OF A MULTI-LEVEL PROJECTED 
STEEPEST DESCENT ITERATION FOR NONLINEAR INVERSE 
PROBLEMS IN BANACH SPACES SUBJECT TO STABILITY 

CONSTRAINTS * 

MAARTEN V. DE HOOP+, LINGYUN QIU* , AND OTMAR SCHERZER§ 

Abstract. We consider nonlinear inverse problems described by operator equations in Banach 
spaces. Assuming conditional stability of the inverse problem, that is, assuming that stability holds 
on a closed, convex subset of the domain of the operator, we introduce a novel nonlinear projected 
steepest descent iteration and analyze its convergence to an approximate solution given limited 
accuracy data. We proceed with developing a multi-level algorithm based on a nested family of 
closed, convex subsets on which stability holds and the stability constants are ordered. Growth of 
the stability constants is coupled to the increase in accuracy of approximation between neighboring 
levels to ensure that the algorithm can continue from level to level until the iterate satisfies a desired 
discrepancy criterion, after a finite number of steps. 
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1. Introduction. We consider nonlinear inverse problems described by operator 
equations in Banach spaces. Assuming conditional stability of the inverse problem, we 
introduce a nonlinear projected steepest descent iteration and analyze its convergence. 
We take the point of view of reconstructing an approximation of the solution to the 
inverse problem in a closed, convex subset of the domain on which the operator is 
defined and where the stability holds. Assuming that we can identify a nested sequence 
of closed, convex subsets on which the stability holds such that the stability constant 
grows in a controlled way, we then extend our analysis to a multi-level approach which 
mitigates this growth via successive approximation. We account also for the possibility 
that a parameter in the operator which defines the inverse problem, and changes the 
data, affects for a given closed, convex subset the accuracy of approximation, and the 
stability constant, as well. Our multilevel approach results in a radius of convergence 
which is significantly larger than the one in the single level approach. In our analysis, 
we incorporate inaccuracy of the data. Our analysis applies, for example, to electrical 
impedance tomography (EIT) and inverse boundary value problems for the Helmholtz 
equation using multiple frequencies. 

Initially, we consider a class of inverse problems defined by a nonlinear map from 
parameter or model functions to the data. The parameter functions and data are 
contained in certain Banach spaces. This situation can be modeled mathematically 
by the operator equation 

(1.1) F(x) = y, x e V(F), y e Y, 



"This research was supported in part National Science Foundation grant CMG DMS- 1025318 and 
in part by the members of the Geo-Mathcmatical Imaging Group at Purdue University. The work 
of OS has been supported by the Austrian Science Fund (FWF) within the national research net- 
works Photoacoustic Imaging in Biology and Medicine, project S10505 and Geometry and Simulation 
S11704. 

t Center for Computational and Applied Mathemematics, Purdue University, West Lafayette, IN 
47907 (mdehoop@purdue.edu). 

i Center for Computational and Applied Mathemematics, Purdue University, West Lafayette, IN 
47907 (qiuSpurdue.edu). 

§ Computational Science Center, University of Vienna, Nordbergstr. 15, A-1090 Vienna, Austria 
(otmar . scherzer@univie . ac . at). 



with domain T>(F) C X, where X and Y are Banach spaces. We assume that F is 
continuous, and that F is locally Frechet differentiable. We do not assume that the 
data are attainable, that is, y may not belong to the range of F. We assume that 
there exists a closed, convex subset Z C X such that 

(1.2) A p {x,x) < £ p \\F(x) -F(x)\\ p , Vx,x <E Z. 

Here A p denotes the Bregman distance (defined below) and p > 1. This states 
conditional Lipschitz stability of the inverse problem. Motivated by |16j , we employ 
a steepest descent iteration, here, to give an approximation to the solution of 
More precisely, we construct a sequence of parameter functions by a projected gradient 
descent iteration with posterior stepsize. 

In many inverse problems, logarithmic type stability is the optimal stability ob- 
tained with minimal assumptions on the domain or pre-image space; see, for example, 
[23] . By constraining the pre-image space, however, Lipschitz stability can be ob- 
tained; for the case of EIT, see [3J [5] and for the case of inverse boundary value 
problems for the Helmholtz equation, see [8] 0] . This is reflected by conditional sta- 
bility given in (|1.2[) . The mentioned projected gradient descent iteration can then be 
viewed as a projection regularization method, which is natural and avoids possibly 
artificial regularization techniques [20] . 

Our first main result concerns restricted convergence of the projected steepest 
descent iteration with a certain Lipschitz type stability condition on a closed, convex 
subset. Moreover, we prove monotonicity of the residuals defined by the sequence 
induced by the iteration. This result is related to two areas of iterative regulariza- 
tion, which are steepest descent algorithms for solving nonlinear inverse problems 
[2~il HZ51 HZ2] and projected iteration regularization techniques for the solution of in- 
verse problems with convexity constraints. The latter have been analyzed mostly in 
the context of linear inverse problems (see, for example, |17j ) and later as accelerated 
methods in [15]. Accelerated methods have been modified to nonlinear problems by 
[28] . The main differences of our work to the above mentioned papers are the condi- 
tions under which we prove convergence. In fact, instead of source and nonlinearity 
conditions (as in [24j[25]), we assume certain Holder or Lipschitz stability of the in- 
verse problem. This is a novel view point, which has been raised in [16] . The steepest 
descent method proposed here is a generalization of the steepest descent method for 
unconstrained linear problems (see for example [Ml). It is however different from 
the generalization for nonlinear problems proposed in |24[ 125] , even for unconstrained 
problems. 

Based on our first main result, we then introduce a multilevel algorithm. We 
assume that there are closed, convex subsets {Z q } qS r of X, on which the restricted 
operator F a — F \z a exhibits a certain Holder or Lipschitz type stability estimate 
with stability constant £ a , that is, 

(1.3) A p (x,x)<£ p jF a (x)-F a (x)\\ p , Mx,x e Z a . 

In fact, F a need not be a restriction of F only, but can also account for a varying 
parameter in F which does affect the data. Here, we assume that Z ai C Z a2 and 
£ai < £ Q2 if «i < oi2- In the context of discretization methods, Z a stands for a finite- 
dimensional subspace of X and the number of basis vectors increases as a increases, 
while the projection can be an orthogonal projection on Z a . In our second main 
result, we introduce a condition on the stability constants and on the approximation 
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errors between neighboring levels. These conditions between levels are coupled and 
guarantee that the result from the previous level is a proper starting point for the 
present level. Thus, the algorithm can continue from level to level until the desired 
discrepancy criterion is satisfied. 

2. Preliminaries. Several constants appear in the analysis. For the readers 
convenience we have grouped them as follows: 

1. £ denotes a constant for the Lipschitz stability of the inverse mapping of F 

(cf. Cup, (ESI)), 

2. £ and £ are properties of the the operator F (cf. (|3.1|) and (|3.2|l ). 

3. C and G with and without subscripts denote properties of the Banach space 

(cf. (EH), (ion . 

2.1. Duality mappings. Let X and Y be Banach spaces. The duals of X and 
Y are denoted by X* and Y* , respectively. Their norms are denoted uniformly by 
|| • || . We denote the space of continuous linear operators X — s> Y by C(X,Y). Let 
F : T>(F) C X — > Y be continuous. Here T>(F) denotes the domain of definition of the 
nonlinear operator F. Let h £ D(F) and k £ X and assume that h + t(k — h)£ T^{F) 
for all t £ (0, to) for some to > 0, then we denote by DF(h)(k) the directional 
derivative of F at h £ T>(F) in direction k £ T>(F), that is, 

DF(h)(k) := hm MlM, 

i-v0+ f 

If DF(h) £ C(X,Y), then i 7, is called Gateaux differentiable at h. If, in addition, the 
limit is uniform for all k belonging a neighborhood of 0, F is called Frehet differentiable 
at h. For x £ X and x* £ X*, we write the dual pair as (x,x*) — x*(x). For a 
linear operator A £ C(X,Y), we write A* for the dual operator A* £ C(Y*,X*) and 
= \\A* || for the operator norm of F. We let 1 < p, q < oo be conjugate exponents, 
that is, 

1 1 

- + - = 1. 

p q 

For p > 1, the subdifferential mapping J p = df p : X — > 2 X of the convex functional 
fp '. x i y ±\\x\\ p defined by 

(2.1) J p (x) = {x* £ X* | (x,x*) = \\x\\ ■ \\x*\\ and ||**|| = \\x\r 1 } 

is called the duality mapping of X with gauge function t i-> t p_1 . Generally, the 
duality mapping is set- valued. In order to let J p be single valued, we need to introduce 
the notion of convexity and smoothness of Banach spaces. 

One defines the convexity modulus Sx of X by 

(2.2) 5 x {e)= mi {l-\\\{x + x)\\ \ \\x\\ = ||2|| = 1 and \\x - x\\ > e} 

and the smoothness modulus px of X by 

(2.3) p x (r) = sup {±(11* + ri|| + \\x- rx\\ - 2) | ||x|| = = 1}. 

x,x£X 



Definition 2.1. A Banach space X is said to be 
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(a) uniformly convex if there exists an e £ (0, 2] such that Sx(e) > , 

(b) uniformly smooth ?/lim T _>.o Px ^ = 0, 

(c) convex of power type p or p-convex if there exists a constant C > such that 
8x(e)>CeP, 

(d) smooth of power type q or q-smooth if there exists a constant C > such that 
Px(t)<Ct1. 

For a detailed introduction to the geometry of Banach spaces and the duality 
mapping, we refer to TU [STJ . We list the properties we need here in the following 
theorem. 

Theorem 2.2. Let p > 1. The following statements hold true: 

(a) For every x £ X , the set J p (x) is not empty and it is convex and weakly closed in 
X*. 

(b ) Theorem of Milman- Pettis: If a Banach space is uniformly convex, it is reflexive. 

(c) A Banach space X is uniformly convex (resp. uniformly smooth) if and only if 
X* is uniformly smooth (resp. uniformly convex). 

(d) If a Banach space X is uniformly smooth, J p {x) is single valued for all x 6 X. 

(e) If a Banach space X is uniformly smooth and uniformly convex, J p (x) is bijective 
and the inverse J" 1 : X* — > X is given by J" 1 = J* with J* being the duality 
mapping of X* with gauge function t t q , where 1 < p, q < oo are conjugate 
exponents. 



2.2. Bregman distances. Because the geometrical characteristics of Banach 
spaces are different from those of Hilbert spaces, it is often more appropriate to use 
the Bregman distance instead of the conventional norm-based functionals \\x — x\\ p or 
|| J p (x) — J p (x)\\ p for convergence analysis. This idea goes back to Bregman [TO] . 

Definition 2.3. Let X be a uniformly smooth Banach space and p > 1. The 
Bregman distance A p (x, •) of the convex functional x i-> ^||a;|| p at x £ X is defined as 

(2.4) A p (x,x) = -\\x\\P--\\x\\p-(J p {x),x-x), xeX, 

P P 

where J p denotes the duality mapping of X with gauge function t t— > f 1 ^ 1 . Note, that 
under the general assumptions of this paper the duality mapping J p is single valued. 

In the following theorem, we summarize some facts concerning the Bregman dis- 
tance and the relationship between the Bregman distance and the norm [TJ [5J EJ US] • 

Theorem 2.4. Let X be a uniformly smooth and uniformly convex Banach space. 
Then, for all x, x € X , the following holds: 

(a) 



(2.5) 
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(J p (x),x). 



(c) A p is continuous in both arguments. 

(d) The following statements are equivalent 

(i) lim^oo \\x n - x\\ = 0, 
(ii) lim^oo A p (x n ,x) = 0, 

(in) lim^oo |a;„| = and lim„^ 00 (J p (x„), x) = (J p (x),x). 

(e) If X is p- convex, there exists a constant C p > such that 

(2.6) AJx,x)>^\\x-x\\ p . 

P 

(f) If X* is q-smooth, there exists a constant G q > such that 

(2.7) A q (x*,x*)<^-\\x*-x*\\i, 
for all x*,x* E X*. 

The Bregman distance A p is similar to a metric, but, in general, does not satisfy 
the triangle inequality nor symmetry. In a Hilbert space, A2(x, x) — \\\x — x\\ 2 . 

2.3. Bregman Projection. In this subsection, we briefly introduce the Breg- 
man projection and its properties, especially, the non-expansiveness. A comprehensive 
introduction to this topic, including a proof of Lemma |2 .71 can be found in 

Definition 2.5. Let X be a uniformly smooth Banach space andp > 1. Given a 
closed convex set Z C X and Bregman distance A p , which is defined in Definition \2.4\ 
the Bregman projection of a point x G X onto Z is the point 

(2.8) P z (x) = argrnin{A p (y,z) | y e Z}. 



Definition 2.6. Let T : X -> X be an operator. The point z G X is called a 
non- expansivity pole ofT if for every x G X, 

A p (T(x), T{z)) + A p (x, T{x)) < A p (x, z). 

A operator T , which has at least one non- expansivity pole, is called totally non- 
expansive. 

Lemma 2.7. Let X be a uniformly smooth Banach space and p > 1 and Z C X 
be a closed convex subset. The following statements hold: 

(a) The Bregman projection Pz is well defined; 

(b) Pz is totally non-expansive and every point in Z is a non- expansivity pole of Pz; 

(c) For every z G Z, 

(2.9) A p (P z (x),z) < A p (x,z), VxgX. 

Throughout this paper, we assume that X is p-convex and q-smooth with p, q > 1, 
and hence it is uniformly smooth and uniformly convex. Furthermore, X is reflexive 
and its dual X* has the same properties. Y is allowed to be an arbitrary Banach 
space; j p will be a single-valued selection of the possibly set-valued duality mapping 
of Y with gauge function t \-} t p— 1 , p > 1. Further restrictions on X and Y will be 
indicated in the respective theorems below. 
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3. Convergence of a projected steepest descent iteration. Here, we as- 
sume conditional stability, that is stability if operator F is restricted to a closed, 
convex subset, Z, of JT(see (|3.3p ). We introduce a projected steepest descent itera- 
tion and analyze its convergence. In this section, we keep Z fixed. We are concerned 
with an approximate solution, in Z, of the inverse problem subject to a discrepancy 
principle. 

Assumption 3.1. Let 

B = Bf(z^) = {x e X | A p (x, «t) < p } c V{F) 

for some p > 0, where p here will come into play as a convergence radius and z' is 
defined below. 

(a) The Frechet derative, DF, of F is Lipschitz continuous on B and 

(3.1) \\DF{x)\\<Z Vie8, 

(3.2) \\DF(x) -DF{x)\\ < Z\\x-x\\ Vx,xeB. 

(b) F is weakly sequentially closed, i.e., 

X n ^X, \ f XeV(F), 

F(x n )^y j { F(x) = y. 

(c) Let Z denote a closed, convex subset of X . The inversion has the uniform Lip- 
schitz type stability for elements in Z, i.e., there exists a constant £ > such 
that 

(3.3) A p {x,x) < £ p \\F{x) -F(x)\\ p Vx,xeBDZ. 



For given data y £ Y , we assume that 

(3.4) dist(j/,F(Z)) <?/, 

for some n > 0. Note that F is continuous and Z is closed. Hence there must exist a 
z* £ Z such that 

(3.5) \\F{z^)-y\\=d\st{y,F{Z)). 

Note, that this condition also accounts for data errors. 

The stopping index K = K(rf) of the following iteration is determined by a 
discrepancy principle 

(3.6) K{rf) := mm{k £ N | \\F(x k ) - y\\ < i)} 
with a fixed 

(3.7) fj > 3 V . 
We introduce the following algorithm: 



Algorithm 3.2. We fix some abbreviations first: For x k , k — 0,1,2,..., fixed 
denote 

(3.8) R k = F{x k )-y, T k = DF(x k )*j p (F(x k ) - y) , r k = \\R k \\, t k = \\T k \\. 
Moreover, we define 



(3.9) i:=\(^y'\t\ 

and for k = 0, 1, . . . 

4 := G q t\ , 

ii fe := -<Lr\ + (1 - 2ln)r k - n - £n 2 , 

1 1 2 1 L_ 2 

v k ■= h "~ r fc _P ( r fe - »?) - U fc r fc ~ P : 

I o. _LU J * 

,,,, — £ f ^ 2/P rA ? .^ r P 2 -p 

Wfc - 2 U J k k k ' 

./Vow, £/ie mam steps of the algorithm: 
(SO) Choose a starting point xq G Z such that 

C' u 



(3.11) A p (x , z f ) < p := -^(2££)- p M + \/l - 8<&? - 477C 

where z* is specified in Theorem \3.3\ below. 
(SI) Compute the new iterate via 

x k+ i= J*(J p (x k ) - /j, k T k ) 

Xk+i = V z (x k +i)- 

Set k <!— k + 1 and repeat step (SI). 

Theorem 3.3. Let Assumption \3. 1\ hold true. Moreover, assume that the esti- 
mate \3. 4\ ) holds for some positive constant n £ (0, (8(7) ~ 1 ) and G Z. 

Then Algorithm \3.S\ stops after a finite number K — K(n) of iterations with the 
discrepancy criterion 

r K = \\F(x K ) - y\\ < 77, 
being satisfied and strict monotonicity of the Bregman distance 
(3.13) Ap(x fc+ i, z f ) < Ap(x fc , z f ) + w k A p (x k , z^) 2/p - v k , 

holds with 

w k A p (x k ,z'i) 2/p ~v k < 0, 

for all k < K(n) - 1. 



Fig. 1. Projected steepest descent iteration 



Proof. We use the same abbreviations for and % as in Algorithm [ 
We start with a collection of elementary estimates that will be used frequently 

afterwards. With the abbreviations defined in (13. 10[) . (13.9[) . inequalities (|2.6I) and 

(pO]) yield 



(3,4) <f(f)^ll^)-W 

<C(r fc + ||F(zt)_ y ||)= 
<£r^ + 2£r/r& + £f? 2 
=Tfe -Uk-rj ■ 

With the mean value inequality and (|2.6j) . it follows that 

(3.15) r fc < ||F(a; fe ) - F(*t)[| + v < 2 ^A p (x k ,z^)-^j " + 
Using the definition of /.if. it follows that for k = 0, 1, . . ., 

(3.16) = , - ^T^tijrf - 

Now, we start with the main body of the proof: We claim that 
A p (a; m ,z t ) < p, m = 0,l,...,K, 



which we prove by induction. Assume the induction hypothesis that 

A p (xk, z^) < p. 

Note that (|3.11[) gives the base case. With (I3.15[) . we have that 



(3.17) rk< 2 ip P)Vp + r)= V „. 

Note that we can rewrite 



1 - d 1 - 8€rj \ l + dl-8€n 

Uk = — £ I r k = h T) \ r k = h r) 

1 2£ M 2£ 



Then, (|3.17[) , combined with the fact that 



1 - -J 1 - 8<&? 
r k > f) > 3i] > ?/ 

gives the positiveness of u k . Note that this leads to the positiveness of v k as following 

L- _L_ 2_ 1 



l l 
1-1 q-i 



^fc""^" 1 ^ ~ P ( r fc - »7 - «fc) 



Using (|2.5[) and (|2.1[) we obtain, for the sequence of residues, 
A p (i fc+ i,z t ) 

(3.18) = Ap^fc^t) + i (||ifc+i|| p - \\x k \\ p ) - (J p {£ k +i) - Jp(x fe ),2r t ) 

= Apfxfc.z 1 ") + - (\\J p (x k+1 )\\ q - \\J p (xk)\\ q ) - (Jp(i fc +i) - Jp(x fe ),zt). 

Applying (|2.5[) and (f) of Theorem I2.4I with x* — J p {x k+ \) and £* = J p (x k ), we get 

-(ikp(i fe+ i)ii 9 Hk P (^)in 

q 

G 

< —\\J p (xk+i) ~ J P {xk)\\ q + (J p (xk+i) - J p {x k ),Xk). 

Substituting (|3.12p and using this inequality in A3. 18)) yields 

Ap(x k +i, z^) - A p (x k ,z^) 
G 

( 3 19 j =— q -\\Jp(x k +i) - J P (x k )\\ q + (J p (x k+1 ) - J p (x k ),x k - z ] ) 
=A*fe ^"^V/T^fc - (T k ,x k - z 1 ) 



We estimate the second term in (|3.19l) . Using (|2.6|) and the Lipschitz type stability 
(O, and ([S3), we find that 

- (T k ,x k - z f ) 
= - {j p (R k ),DF(x k )(x k -zt)> 
(3 20) = - {j P (R k ),Rk) + (j p (Rk),F(z^-y)) 

+ (j p {R k ), F(x k ) - F(z^) - DF(x k )(x k - zt)) 

< - ht 1 (rk-v- §in - z^w 2 ^ . 

From (|37T9|) and (|3~20)) . it follows that, for fc = 0, 1, 2, . . ., 

A p (i fe+ i, z T ) - A p (x k , z f ) 
( 3 - 21 ) ^ p-ifG^tHl , t||2 \ 

and hence, by (|3.21[) . ()2.6)) and the non-expansiveness of the Bregman projection (|2.9[) . 
we arrive at 

ApOfe+i,^) - A p (x fe ,z t ) 
<A p (xk+i,2 t ) - A p (a; fe ,2: t ) 

Using the identities in (|3.16l) and abbreviations f|3.10[) . (13.91) we derive that 

Ap(x fc+ i, z f ) - Ap(x fc , z f ) 

(3-23) £/C^" 2/p 



2 V P 

= - v k + w k A p (x k ,z^) 2/p . 
We finish the proof of the monotonicity of A p (xfc, z') by showing that 

-Vk + w k A p (x k ,z^) 2 /P <0. 



In fact, 



u> fc A p (x fc ,zt) 2 /P 



( 3 - 24 ) ^f^V **( + ,2f-^T P *- P 

t € i^ + v) h u k r i 



2 V P 
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Hence 

-v k + w k A p (x k ,z'<) 2 /P 
(3.25) <~v k - i^uf T+1 i{- p + (r k - r,)^ uf 1 r{ ~ p 

-h^<ri~ p < 0. 



The above monotonicity of A p (x k , z^) with the induction hypothesis completes the 
induction. 

It is left to show that Algorithm 13.21 stops after a finite number of iterations(i.e. 
K(ri) iterations). We prove this by contradiction. Suppose that Algorithm 13.21 does 
not stop within a finite number of iterations and, hence, 

(3.26) r k > fj, Vfc > 0. 

Then, from the monotonicity of the Bregman distances (|3.13[) and (|3.25[) . we have 
that 



-, fc-i ^ 

< A p (x k , zJ) < A p (x , zt) - - Y t n uP n rP 2 'P , Vfc > 0. 



n=0 

It follows that 

00 i 

n=0 

and hence that u k converges to as k goes to infinity. By writing 



1 - d 1 - 8£r) \ 1 + V 1 - 

u k = -£ I rfc h ?7 r fe 



2£ / \ 2£ 



we have that 



1 - a/ 1 - 8&? 
lim r k = — = ?/ < in < f}, 

fc->oo 2£ 



which is a contradiction. 
□ 



Remark 3.4. We refer to Alaorithm \3.2\ as a steepest descent algorithm in the 
sense that it is a generalization of the steepest descent algorithm for linear inverse 
problems. Indeed, let F be linear and assume that we have an unconstrained problem. 
Then both £ and rj can be chosen to be equal to zero. Then we have 

/ r P \ 1/(9-1) 

fik= VtG~J ' fc " ' 1 ' 2 "--' 

with r k — \\Fx k — y\\ and t k — \\F*j p (Fx k — y)||. In particular, for a Hilbert space 
setting, where 

p — q — 2, C p = G q — 1 , J p — J q = Id, 
11 



we get 



/i fe = -|, k = 0,1,2,..., 

which is the standard parameter choice of the steepest descent method J18\/ . See also 
U9j/ for efficient adaptations of the Landweber iteration. 

In the Hilbert space setting, moreover, the condition S3.11\) requires that £<-!-, 
which in some sense restricts the curvature. Note that for p — 2 we have pv^yp ~ 

\\x - x\\ 2 /\\F(x) - F{x\\ 2 < £ and therefore |j < ££ ; where \\F'[x)\\ denotes 
the operator norm of a directional derivative in direction x — x, and F" is the sec- 
ond derivative in the same direction. Thus condition A3.ll]) can be interpreted as a 
curvature to size condition ( see for the curvature to size concept for variational 
regularization) . 

Remark 3.5. We refer to iS.ll]) as a generalized radius of convergence from 
the nonlinear Landweber iteration to a steepest descent algorithm in Banach spaces. 
Indeed, let n be equal to zero. Then H3.ll}) can be reduced to 

W><p-*^-(2)'(^)". 

which coincides the convergence radius for the nonlinear Landweber iteration in Ba- 
nach svacesfl 6]/. 

4. Extension to a multi-level algorithm. In this section, we consider a set, 
{Z a }a>Q, of closed and convex subsets of X, and an operator family {F a } a >o, where 
F a is obtained as F a = F\z a , or approximations of F. We let 

B = B%(x^) = {x G X | A p (x,xt) < Po } c V(F) 

for some po > 0, which is specified in Theorem 14.41 and invoke 

Assumption 4.1. 

(a) F is weakly sequentially closed, that is, 

Xn X, I f X € V(F), 

F{x n )^y J => \ F(x) = y. 

(b ) The Frechet derative, DF a , of F a is Lipschitz continuous on B H Z a and 

(4.1) \\DF a (x)\\ < £ Q VxeBnZ a , 

(4.2) \\DF a (x) -DF a (x)\\ < £ a \\x - x\\ Vx,xeBnZ a . 

(c) The inversion has the uniform Lipschitz type stability for elements in Z a , that is, 
there exists a constant £ Q > such that 

(4.3) A p (x,x) < e*\\F a (x) -F a (x)\\ p Vx,x e B n Z a . 
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For the stability constants, and the approximation error, {i] a }, we introduce 

Assumption 4.2. 

(a) Let r/ a = r) a (y) be defined by 

■q a = dist(y,F Q (Z Q )), yeY; 

Moreover, we assume that r\ a is non-negative and monotonically decreasing with 
respect to a for every fixed y G Y . 

(b) If Z ai C Z a . 2 then £ ai < £ Q2 . 

(c) If a± < ct2 then Z ai C Z a2 and therefore also T) ai > r] a2 . 

Typically, the subsets Z a are finite dimensional and the stability constant for the 
inversion grows with the dimension of these subsets. The nature of our multi-level 
algorithm is intimately connected to finding sparse, albeit approximate, representa- 
tions of the solution to the inverse problem, mitigating the mentioned growth of the 
stability constants. Indeed, the objective is very similar to multi-level techniques for 
solving inverse problems [551 HOI HI] > where one exploits that the finite-dimensional 
problems are stable and that the outcome of an iteration on a coarse level gives a 
good initial guess on a finer level. In this section, we combine any known controllable 
factors to an abstract index a of the operator family and design a progressive iteration 
method with the aid of the result from the previous section. 




In the following algorithm, we refer to the parameter a as an index and only 
nonnegative integer valued a is considered. 
Algorithm 4.3. 

(50) Use 2o,o as the starting point. Set n = 0. 

(51) Iteration. Use F n and Z n as the modelling operator and convex subset to run 
Alaorithm \3.2\ with the discrepancy criterion given by 

(4.4) K n = mm{k 6 N | \\F n {x n>k ) - y\\ < (3 + e)i ln }, 

where e > is a given uniform tolerance constant. 
STOP, if n = N , a given number. 

(52) Set a;„+i,o = x n ,K n , n = n + 1 and go to step (SI). 

13 



This algorithm is illustrated in Figure [5] 



Theorem 4.4. Assume that As sumptions [^TT| and \4T2\ hold. Assume that there 
exists a finite subset of operators, {F n }^ =1 say, from the operator family {F a } such 
that 

(a) The starting point xo.o is within the first convergence radius, that is, 
(4-5) A p (x 0fi , zl) < p , 

where z\ denotes the Zq best approximating solution, i.e., 

\\F (zl)-y\\ =dist(y,F (Z )), 
and the Zq convergence radius po * s defined by 



P I 2C 

with £ = \ (^f) 2 ' V 

(b) For every two neighbor levels Z n and Z n +i, n = 0, . . . , N — 1, the constants r\ n 
andrj n+ i, £ n +i, £ n +i, £n+i satisfy the following inequality 
(4-6) 



/C„\ 1/p - I l + \/l-8<£n +1 r) n+1 
(3 + e) Vn < -£ (£ Il+ iG:„ + i)- 1 ^—3 2t7„ +1 - ry„ +1 , 



w/iere £ n+ i = 5 ( J £™+i£„+i- 

('cj iV is £/ie /irs£ positive integer such that t/n < (3 + e)^ 1 !), that is, 

(3 + e)r] n > 77 Vn < N 

and 

(3 + e)^ < f). 

Then, Algorithm \4.3\ has the property that it stops after a finite number of iterations 
when the discrepancy criterion 

(4-7) \\F N (x N , KN )-y\\<f} 
is satisfied. 



The strategy of the proof is to estimate the decreasing objective function |f(3; nj ji-J- 
y\\ level by level. That is, one applies Theorem 13. 31 to guarantee that the discrepancy 
criterion (|4.4j) is attained with a finite number of iterations on each level n. Then, 
with (|4.6j) and (|4.4|) . we show that the initial point x n +i,o on level n + 1, which coin- 
cides with the iteration result x n ^K n on level n, is within the convergence radius p n +i- 
Therefore, the procedure continues until (|4.7p is satisfied. 
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Proof. We first adapt the convergence radius, p, in Theorem 13.31 to a n-level 
convergence radius p n . For any n-level, n = 0, 1, 2, . . . , N, one can use Algorithm 13.21 
to obtain an approximate solution to the operator equation 

with a given starting point x n ^ and the discrepancy criterion given in (j4.4[) . If the 
starting point x n0 satisfy 



(4.8) A(^4)<p n 3£/ 1 + V '^ '" 




P \ 2£„ 

where z\ denotes the best .^-approximation, then Theorem 13.31 can be applied to 
show that Algorithm 13.21 stops after a finite number of iterations with 



\\F n (x nM )-y\\ < (3 + e)77„ 

satisfied. Next, we show that, in particular with condition (|4.6[) . if the starting point 
for the present level, x n ,o, is within the convergence radius, then the starting point 
for the next level, x n +i,o, which is equal to x n ,K n , is within the convergence radius 
for the next level. That is to say, 

A p (x nfil zl) < p n 

implies 

A p (x n+ ifi, z n+l) — Pn+li 

for all n < N. Indeed, for any n < N, according to (|4.8j) and Theorem I3.3[ after K n 
steps, the n-level discrepancy criterion, 

\\F{Xn,K n )-V\\ < (3 + e)?7n, 
is satisfied. Then, with the above inequality and (|4.3|) . we estimate 

A p (x n+1 , , 4 +1 ) 1/p 

^ ^^n+lll-Fn+l^n+l.o) - Fn+l(z n +l)\\ 

<£ n+1 (\\F n+1 (x n+lfi ) - 2/H + ||F„ +1 (4 +1 ) - y\\) 
<€ n+1 ((3 + e) V 

n T Vn+l)- 

Note that (|4.6p leads to the inequality 



fC„\ 1/P - , / 1 + V 1 - 8£n+lVn+l 
<£ n+1 ((3 + s) Vn + Vn+l) < — fl^fl 2Vn+l 

\ P J \ 2(L n+ i 

Substituting this into (|4.9[) , we have that 
(4.10) A p (x n+h0 , zl +1 )^ 



<{ c A 1/P z-i W 1 - 8 ^^ 2n i _ i/p 
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For the last N- level, we apply Theorem 13.31 again to find that 



\\F n (x N ,K N ) -y\\ < {3 + e)n N < i). 

□ 



Remark 4.5. We interpret that Algorithm \4-S\ is designed to achieve the optimal 
(or nearly optimal) accuracy for a feasible starting point. Usually, the finest level bears 
both the smallest approximation error, which corresponds to the optimal accuracy, and 
the largest stability constant. Note that the definition of the convergence radius i3.11\ ) 
shows its algebraically decaying property with respect to the stability constant. There 
are cases when only a rough starting point is available. For these cases, one may 
fail to obtain a reasonable result using Algorithm \ 3. 2\ directly on the finest level but 
Algorithm \4-3\ leads to a good approximation solution. The condition \4-(fy can be 
interpreted as a strategy for picking next finer level, which is characterized by its 
stability constant constants approximation error rj n+ i and £n+i- 

Theorem 14.41 especially (ii), indicates that a sufficient condition for the existence 
of such a selection of operators is that the tolerated best-ZVi-approximation is within 
the convergence radius of Z n+ \. In fact, this condition comes from a bootstrap type 
competition between rj n and p n . 

We give an example of how conditions (ii) and (iii) in Theorem l4.4l can be satisfied. 

Example 4.6. Assume that X and Y are Banach spaces and that we can reindex 
the convex subsets {Z a } such that Assumptions ^. l\ and \4~^\ hold. Moreover, for given 
fj > 0, the following conditions hold: 

(i) Given starting point xq^q is within the first convergence radius po, i.e., 



C - f 1 + V 1 - 8£o?7o 
V \ 

(ii) The approximation error r\ a — \e~ a (a + 2) _1 for some constant A >> If]. 

(iii) The stability constant € a = 2e a , 

(iv) The dynamic models of the constants £, a and £, a , which are related to the Lips- 
chitz continuity of the Frechet derivative DF a , are given by 

£ a = {a + l)e- a and Z a = re"", 

for some constant r such that 

c n \ 3/p 1 



0<T< [ ^ 



p J 16A(4e + l) 

Now, we can choose the operators {F n }^ =0 defined by F n = F \z n and set the uniform 
tolerance constant e = 1 to run Algorithm ^. S\ where N is the first integer such that 
At]n < fj is satisfied. Applying Theorem\4--4\ we conclude that 



\\F n (x n .k n ) -y\\<fj 



is satisfied after a finite number of iterations. 
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In this example, we can quantify the intermediate constant C n and the conver- 
gence radius p n by 

r \ -Vp 



e„ = 2r ( ^ ] 



and 



n _C P . (l + y/l-^rk, 

Pn — ^ n 



P \ 2€ n 




Noting that 



i < 1 - 4£„?7„ < 1 + y 1 - 8£„77„ - 4£„r?„ < 2 - 4£„?7„ < 2, 

for n — 0,1,..., N, we conclude that, for the convergence radius p n , the dynamic 
model is 

(y) (8r)- p (n + l)- p <P„ < (2r)-f(n + l)- p . 

Let us assume that we are in a situation where only a rough starting point x is 
available such that 

(4.11) A p (x,zl)<(^\ (8t)-v< Po 
but 

(4.12) A(x, 4) > (2 T )-f (JV + l)- p > p N . 

If we run Algorithm 13.21 for single 0-level, by (|4.11[) . Theorem 13.31 can be applied 
but the optimal residue estimate we can expect can not be smaller than the 0-level 
approximation error r/o = A/2 >> fj. If we run Algorithm 13.21 for single N- level, 
according to (|4.12|) , there is no guarantee that Algorithm 13.21 will stop after a finite 
number of iterations nor yield a reasonable result. Hence a multilevel approach, as 
Algorithm 14.31 is proposed to obtain a high-accuracy arroximation xn,k n satisfying 

\\F(xn,K n ) - y\\ < V- 



5. Discussion. We discuss a steepest descent iteration method for solving non- 
linear operator equations in Banach spaces. Provided that the nonlinearity of the 
forward operator obeys a Lipschitz type stability in a convex and closed subset of 
the preimage space, we could prove a restricted convergence result and provide an 
estimate of the error decease. Based on the analysis of the radius of convergence, we 
introduce a multilevel method and obtain a sufficient condition on the choices of the 
parameters, mainly on the approximation errors and stability constants. 

As an example, we mention inverse boundary value problems for the Hclmholtz 
equation. Indeed, stability estimates satisfying Assumption 14.21 have been obtained 
[5] where Z a represents a space spanned by a finite linear combination of piecewise 
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constant functions. Using our multi-level algorithm, we arrive at a convergence result 
by successive approximation. This result can be further improved, using the same al- 
gorithm, by combining different frequencies and exploiting the frequency dependence 
of the stability constants. The idea of using multiple frequencies was proposed by 
Chen [13], who introduced an algorithm based on recursive linearization. The algo- 
rithm starts with an initial guess at the lowest frequency, which typically captures 
the coarse scale variations in the wavepeed. Then the Born approximation is invoked 
[T51 [SJ [7J . The iteration is based on a linearization of the inverse problem at the 
present frequency. By progressively increasing the frequency and carrying out the iter- 
ations, increasingly finer details are added to the wavespeed model until a sufficiently 
accurate result is obtained. In [7J, the convergence of this algorithm was established 
under certain conditions. As a direct application of Theorem 14.41 the convergence 
of this algorithm can be revisited. Especially, (14.61) offers a strategy for picking the 
frequencies and regularization parameters. 
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